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Abstract — This  paper  addresses  the  problem  of  calculating  the 
multidimensional  probability  density  functions  (PDFs)  of  statistics 
derived  from  known  many-to-one  transformations  of  independent 
random  variables  (RVs)  with  known  distributions.  The  statistics 
covered  in  the  paper  include  reflection  coefficients,  autocorrela¬ 
tion  estimates,  cepstral  coefficients,  and  general  linear  functions  of 
independent  RVs.  Through  PDF  transformation,  these  results  can 
be  used  for  general  PDF  approximation,  detection,  classification, 
and  model  order  selection.  A  model  order  selection  example  that 
shows  significantly  better  performance  than  the  Akaike  and  MDL 
method  is  included. 

Index  Terms — Classification,  class-specific  features,  PDF  estima¬ 
tion,  sufficient  statistics. 


I.  Introduction 

IN  THIS  paper,  we  present  approximations  of  multidimen¬ 
sional  probability  density  functions  (PDFs)  for  statistics  de¬ 
rived  from  the  standard  normal  distribution.  Let  z  =  T(x), 
where  x  is  a  vector  of  independent  and  identically  distributed 
(iid)  samples  of  zero-mean  Gaussian  noise  of  unit  variance.  The 
feature  extraction  function  T( )  can  be  any  useful  set  of  statis¬ 
tics.  The  challenge  is  to  accurately  evaluate  the  joint  multidi¬ 
mensional  PDF  of  z.  The  results  must  be  valid  everywhere,  in¬ 
cluding  the  tails  of  the  PDF.  We  show  that  the  results  can  be  used 
to  approximate  p(x | H i)  for  an  arbitrary  alternative  hypothesis 
Hi.  This  approach  has  applications  in  detection,  classification, 
and  model  order  selection. 

A.  Motivation  and  Previous  Work 

The  distribution  of  statistics  derived  from  purely  white 
Gaussian  noise  (WGN)  have  been  studied  in  the  past;  how¬ 
ever,  applications  have  been  limited  because  WGN  is  rarely 
encountered  in  practice.  An  important  application  of  the  WGN 
condition  is  as  the  null  hypothesis  in  testing  for  colored  noise. 
Tests  for  colored  noise  based  on  the  periodogram  [1]  and  serial 
autocorrelation  function  [2]-[4]  have  been  studied. 


Manuscript  received  September  26,  2000;  revised  June  14,  2001.  This  work 
was  supported  by  the  Office  of  Naval  Research.  The  associate  editor  coordi¬ 
nating  the  review  of  this  paper  and  approving  it  for  publication  was  Prof.  Jian 
Li. 

S.  M.  Kay  is  with  the  Department  of  Electrical  and  Computer  Engi¬ 
neering,  University  of  Rhode  Island,  Kingston,  RI  02881  USA  (e-mail: 
kay  @  ele .  uri .  edu) . 

A.  H.  Nuttall  and  P.  M.  Baggenstoss  are  with  the  Naval  Undersea  Warfare 
Center,  Newport,  RI  02841  USA  (e-mail:  p.m.baggenstoss@ieee.org). 

Publisher  Item  Identifier  S  1053-587X(01)07772-8. 


Since  the  introduction  of  methods  related  to  the  class-specific 
method  [5]-[7],  the  WGN  hypothesis  has  become  increasingly 
useful.  This  is  because  WGN  is  seen  not  as  a  hypothesis  to  be 
explicitly  tested  but,  rather,  as  a  reference  hypothesis  for  con¬ 
verting  likelihood  tests  into  likelihood  ratio  tests  in  a  way  similar 
to  the  “dummy”  hypothesis  of  Van  Trees  [8]  and  then  taking  ad¬ 
vantage  of  sufficient  statistics  on  a  class-by-class  basis.  In  par¬ 
ticular,  testing  hypothesis  Hi  against  FA  can  be  accomplished 
by  comparing  the  likelihood  ratios  (LRs)  according  to 

Kx|ffi)  <  P(x|g2)  , 

p(x\H0)  p(x.\H0) 

where  p(  )  denotes  a  PDF,  and  Ho  is  any  reference  hypothesis, 
such  as  the  WGN  case.  By  finding  class-specific  sufficient  sta¬ 
tistics  Zi  =  Ti(x)  and  Z2  =  T2(x),  (1)  can  be  reduced  to  the 
LR  comparison 


P(z i\Hi)  <  p{z2\H2) 
p(zi\Ho)  p(z2\H0) 

where  Zi  must  be  sufficient  for  H±  versus  Ho,  and  Z2  must  be 
sufficient  for  H  >  versus  Ho-  Note  that  only  the  low-dimensional 
numerator  PDFs  need  to  be  approximated  from  training  data. 
Clearly,  the  denominator  PDFs  p(z,i\Ho)  and  p(z2\Ho)  must 
be  evaluated,  which  is  the  topic  of  this  paper.  The  extension  to 
M  hypotheses  is  obvious. 

In  a  later  development,  a  theorem  that  extended  the  class-spe¬ 
cific  approach  to  the  case  when  sufficiency  of  the  statistics  could 
not  be  guaranteed  was  introduced  [9],  [10],  This  latter  theorem 
allows  the  PDF  of  a  set  of  statistics  to  be  converted  into  a  PDF 
of  the  input  data.  More  precisely,  let  zi  =  Tj(x)  be  any  mul¬ 
tidimensional  set  of  statistics  derived  from  the  raw  data  x.  Let 
p(zi\Hi)  be  an  approximation  to  the  PDF  of  Zi  under  hypoth¬ 
esis  Hi  -  Then,  the  PDF  of  x  under  Hi  can  be  approximated  by 


p(x\Hi) 


p(x.\H0ii) 

p(zi\H0,i) 


p(zi\Hi) 


at  zi  =  Tj(x)  (3) 


where  Ho,  1  is  a  fixed  reference  hypothesis  chosen  specially  for 
Hi-  According  to  Theorems  1  and  2  of  [9],  (3)  is  always  a 
PDF;  thus,  it  integrates  to  1  over  x  for  any  reference  hypoth¬ 
esis  Ho,  1  and  any  transformation  Ti(  ),  provided  p(z,i\Ho) 
meets  a  mild  positivity  requirement  [9].  More  precisely,  we  must 
have  p(z,i\Ho,  1)  >  0  whenever p (zi| H 1)  >  0.  While  no  addi¬ 
tional  requirements  are  needed  forp(x|fTi)  to  be  a  PDF,  the  pair 
(Ho,  1,  Tj(  ))  should  be  chosen  carefully  so  that  p(x.\Hi)  is  a 
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good  approximation  to  p(x\Hi).  In  particular,  if  z±  =  Ti(x)  is 
approximately  sufficient  for  distinguishing  Hi  from  ifo?  1 ,  and 
p(zi|ifi)  p(z1\H1),  then  p'x.  11 ;  >  p(x|ifi). 

Approximate  sufficiency  can  be  formally  defined  by  the  rela¬ 
tionship 


(RVs)  that  are  all  Gaussian  of  zero  mean  and  unit  variance.  The 
feature  set  whose  distribution  we  seek  is  denoted  z  =  T(x)  = 
[zi  •  ••  zm\ •  where  M  <  N  generally. 

II.  Some  Exact  Solutions 


p(x|-Hi)  _  p(zi\Hi) 
p(x\H0,i)  ~  p(zi\Ho,  1) 

although  in  practice,  approximate  sufficient  statistics  are  ob¬ 
tained  not  always  by  mathematical  analysis  but,  often,  by  expe¬ 
rience  and  intuition.  If  approximate  sufficient  statistics  zi  can  be 
found,  p(x|f?i)  can  be  approximated  simply  by  choosing  a  suit¬ 
able  reference  hypothesis  Hq:  i,  then  approximating  p(z,i  \Hi), 
and  finally  converting  this  PDF  into  a  PDF  of  x  using  (3).  This 
represents  a  new  general  method  for  PDF  approximation  and 
statistical  hypothesis  testing.  Using  (3),  a  classifier  may  then  be 
constructed  using  class-specific  features 


>(x|ff0,i)  ~ 

_p(zi|fT0,i)_ 


p(zi\Hi)  > 


'  p(*\Hq,  2)  ~ 
_p(z2\H0>2). 


p(z2\H2) 


(4) 


where  zi  =  T\  (x),  and  z2  —  T2(x).  The  extension  to  M  classes 
is  obvious.  This  classifier  requires  PDF  estimates  with  dimen¬ 
sion  only  as  high  as  the  largest  set  of  statistics  and  not  as  high 
as  the  total  number  of  all  statistics,  as  in  a  classical  classifier. 

In  this  paper,  we  limit  ourselves  to  a  particular  choice  of 
reference  hypothesis,  namely,  the  WGN  case  denoted  by  Ho- 
Choosing  the  WGN  hypothesis  for  Ho  has  many  advantages. 
First,  an  analytic  solution  forp(z|fTo)  is  often  tractable.  Second, 
the  condition  of  positivity  is  always  met.  Third,  the  sufficiency 
requirement  against  WGN  is  itself  often  a  sensible  requirement. 
Finally,  the  solutions  for  the  WGN  case  provided  herein  can 
often  be  easily  modified  for  arbitrary  Gaussian-based  distribu¬ 
tions.  When  a  common  reference  hypothesis  is  used,  the  classi¬ 
fier  simplifies  to 


Kxl-ffo) 

,p(  Zll-ffo) 


p(zi\Hi)  > 


'  p(x\H0)  ~ 
_p(z2\H0)_ 


p(z2\H2). 


(5) 


This  classifier  is  identical  to  (2),  but  it  has  a  different  interpre¬ 
tation. 


B.  Need  for  Accurate  PDF  Approximations  in  the  Tails 

Because  Ho  is  a  fixed  reference  hypothesis  determined  prior 
to  the  measurement  of  data,  it  is  possible  that  the  actual  data 
lies  on  the  distant  tails  of  the  PDF  p(x\Ho).  This  is  also  true 
at  the  output  of  the  feature  transformations.  Thus,  it  is  possible 
that  both p(zi  \Hq)  and  p(z2\Ho)  approach  zero  simultaneously. 
Therefore,  accurate  tail  approximations  of  p(zj\Ho)  are  needed 
for  meaningful  results. 

Commonly-used  approximation  methods  such  as  the  central 
limit  theorem  (CLT)  do  not  provide  accurate  answers  in  the  tails. 
In  this  paper,  we  apply  the  multidimensional  saddlepoint  ap¬ 
proximation  (SPA)  [2],  [11],  which  can  provide  accurate  PDF 
tail  estimates. 


C.  Notation 

In  the  remainder  of  the  paper,  the  raw  input  data  x  = 
[.XT  •  •  •  xjy]'  is  a  set  of  independent  real  random  variables 


For  some  transformations  z  =  T(x),  the  exact  joint  PDF  of 
z  can  be  derived.  Some  of  these  transformations  can  be  seen 
as  special  cases  of  more  general  problems  for  which  we  have 
derived  approximations.  Therefore,  they  can  serve  as  important 
test  cases  for  the  more  general  results  (especially  in  the  tails). 

A.  Order  Statistics 

Order  statistics  are  important  features  in  classification. 
Examples  of  order  statistics  are  the  three  largest  FFT  bin 
magnitudes  or  the  median  of  N  sample  values.  The  joint 
distribution  of  a  collection  of  order  statistics  is  easily  found  for 
i.i.d.  RVs.  [12].  Consider  N  iid  samples  derived  from  x  using 

a  transformation  w„  —  T(xn).  Define  w  =  [ur  •••  wjx]'. 

,\ 

Let  y  =  [2/1  -  -  -  Pn\  be  obtained  from  w  by  sorting  into 

,\ 

ascending  order.  Let  z  =  [z±  •  •  •  Zm]'  be  a  subset  of  y. 
Specifically,  z1  -  yni,  z2  -  y„2,  ■  ■  ■ ,  zM  -  VnM,  where 
1  <  ni  <  n2  <  •  •  •  <  nM  <  N. 

Let  I  f,  (w)  and  pw  (w)  be  the  cumulative  distribution  func¬ 
tion  (CDF)  and  PDF  of  wn,  respectively.  Then,  for  M  —  2,  for 
example,  the  joint  PDF  is 

P~(z: l,  z2) 

= _ — _ \p  (7l  )](«i-i) 

(ni  -  l)!(n2  -  rn  -  l)!(iV  -  n2)\  L  wK  Vi 

■  Pw(zi)[Pw(z2)  -  Pw(zi)]{n2~ni~1)pw(z2) 
■[l-Pw(z2)]{N~n2\ 

Extending  this  to  arbitrary  order  M,  consider  the  nith,  ri2th, 
•  •  •,  n  iyth-order  statistics,  where 

1  <  ni  <  n2  <  •  •  •  <  riM  <  N 

follows  the  joint  PDF 

Pi(zl7  z2  •  •  •  zM) 

N\ 

(ni-l)!(n2-ni-l)!(n3-n2-l)!  ••• 

•  [Pw  (zi )] -1  )  [Pw  (z2) -Fu.(^i)]("'2_"'1_1) 

•  Pw(z2)[Pw(z3)~  Pw{z2)]{n3~n2~1',pw(z3)  ■  ■  ■ 

■  [1  -Pv(zm)]{N-,1m)Pw(zm)-  (6) 

We  consider  two  cases. 

1)  When  {x„  }  are  real  Gaussian  iV(0,  1)  RVs  and  wn  — 
|x„  |,  then  {tu,,}  are  Chi(l)  (chi-distributed  with  1  degree 
of  freedom),  and  we  have  pw(w )  =  (2 / \/72rK)e~w~ !2  and 
Pw(w)  —  ert(w/s/2)  for  w  >  0. 

2)  When  {x„ )  are  complex  Gaussian  CN(0,  2)  RVs  and 
wn  =  |x„  |2,  then  {tu,,  }  are  Chi-squared(2)  (chi-squared 
with  2  degrees  of  freedom),  and  we  have  pw{w )  — 
(l/2)e_“’/2  and  Pw(w)  —  1  —  e~w /2  for  w  >  0. 

Using  these  forms  for  Pw(w )  and  pw(w )  and  (6),  we  can  eval¬ 
uate  the  desired  PDFs  for  z.  Calculation  involves  some  delicate 
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numerical  problems,  especially  in  the  tails;  however,  these  can 
be  successfully  dealt  with,  often  by  resorting  to  exceedance  dis¬ 
tributions  instead  of  cumulative  distributions. 


B.  Autocorrelation  Function  and  Reflection  Coefficients 

A  widely  used  model  for  signal  processing  applications  is  the 
autoregressive  (AR)  filter  driven  by  white  Gaussian  noise  [13]. 
The  infinite  length  autocorrelation  function  (ACF)  completely 
describes  such  processes.  However,  practical  signals  have  an 
autocorrelation  function  that  either  decays  to  zero  or  is  periodic. 
Thus,  a  finite  number  of  autocorrelation  samples  often  provide 
adequate  information  to  characterize  the  process.  In  short,  a  set 
of  autocorrelation  samples  can  be  approximately  sufficient  for 
testing  statistical  hypotheses  concerning  the  process. 

There  are  many  ways  to  compute  an  autocorrelation  estimate 
[13].  These  methods  are  asympotically  equivalent  for  large  N 
but  differ  significantly  for  small  N.  We  will  concern  ourselves 
with  one  particular  ACF  estimator  because  an  exact  formula  can 
be  found  for  its  PDF,  which  is  due  to  Watson  [14].  In  particular, 
we  use  the  normalized  circular  autocorrelation  samples  f;. .  Let 


where 


and 


h  =  ~,  k  =  l--P 

TO 


i=  1 


(7) 


i= 1 


and  we  define  x^+i  =  #»•  Assume  that  N  is  odd  so  that 
n  —  (N  —  l)/2  is  an  integer.  Then,  the  exact  joint  PDF  of 
{fi,  f2  ■  ■■ rp }  is 


p(fi,  h  ■  ■  ■  rp) 


EE-E 


r(n) 
r(n  -  P) 

{ji,h  ■■■jp)eSP(r1,r2  —rp) 

■q{ju  32  ■■■  jp)  (8) 


where 


<l(j  1,  32  ■■■  jp ) 

=  sgn[det(B(ji,  j2  ■  ■  •  J»)][det(A(ji,  j2  ■  ■  •  jp))]11-^1 
det(C(ji,  j2  ■■■  jF)) 

and  sgn(jr)  =  1  for  x  >  0,  =  —  1  for  x  <  0,  and  =  0  for  x  —  0, 
and  the  matrices  A,  B,  C  are  of  dimension  (P  +  1)  x  (P  +  1), 
P  X  P,  and  {P  + 1)  X  (P+ 1),  respectively.  They  are  defined  as 


A(ji,  j2  •  •  •  jp) 


r  i 

hi 


t'2 

JJ 

hi 


U  7 


(!)  .,(2) 


JP 


JP 


rP 

jp 

hi 


y{n 

hp  J 


B(ii,  j2 


IV1) 

hi 

JJ  . 

hi 

••  7(P)1 

hi 

jp)  - 

'32 

J?)  . 

h  2 

.. 

h  2 

JJ 

L  'Jp 

JJ  . 

hp 

..  ~P 

'JP  -1 

c(ii,  j2  •  •  •  jp)  — 


39^31,  h—jp 


1  % 

1  7^ 


(!)  „(2) 


J?) 

hi 


A 


(!)  „(2) 


3p 


3p 


sn 


JP 

hi 


JP 

hp 


where 


7 


(G  _ 


cos(  ^  jk 


j  =  1,  2  •  •  •  n ;  k  =  1,  2  •  •  •  P. 


The  region  Sp(f i,  r2  •  •  •  rp)  is  the  convex  polyhedron  in 
li1'  formed  from  the  vectors 


L7y 

or  the  convex  polyhedron  of  the  vectors  {xi,  X2  •  •  •  x/>+1j  is 
the  convex  set  formed  by  the  linear  combination 


1 

hi 

VV 

Jp 

y(2) 

J?) 

'Jp 

hi 

V 

Jl  J 

1 

.  .  ^  0, 

_ 1 

•P+i 

^  ^  x  , 

i=l 

where  cr,  >  0,  and  cr,  <  1.  A  MATLAB  implementation 

of  (8)  is  provided  in  Appendix  A. 

1 )  Experimental  Verification:  To  validate  the  analysis,  it  is 
useful  to  compare  the  derived  PDF  with  experimental  values. 
The  above  analysis  can  be  verified  experimentally  by  comparing 
a  scatter  plot  of  the  first  two  normalized  circular  ACF  estimates 
with  an  intensity  image  of  the  PDF  obtained  from  the  program. 
Fig.  1  shows  such  a  comparison  for  N  —  9  samples.  The  figure 
illustrates  that  for  small  TV,  the  range  of  possible  ACF  values 
occupy  regions  with  linear  boundaries  in  the  plane.  The  shape 
of  the  PDF  itself  at  N  —  9  is  a  tetrahedron. 

2)  Reflection  Coefficients:  Due  to  the  one-to-one  transfor¬ 
mation  linking  the  normalized  autocorrelation  coefficients  with 
the  reflection  coefficients  [13],  it  is  possible  to  utilize  the  above 
results  to  find  the  exact  distribution  of  reflection  coefficients; 
however,  this  applies  only  when  the  circular  autocorrelation  co¬ 
efficients  (7)  are  used.  Consider  the  transformation 


[Kx  ■  ■  ■  Kp\  =  Tjh  ■  ■  ■  rp) 

where  {KJ  are  the  reflection  coefficients.  The  joint  PDF 
p(K i  ■  ■  ■  Kp)  requires  knowing  the  Jacobian  of  the  transfor¬ 
mation  Ti(  ),  which  is 


P—1 

log  Jr  =  ^  (F  “  *)  Ml  “  Ki)~ 

i- 1 
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Fig.  1.  (Left)  Comparison  of  (8)  with  (right)  a  scatter  plot  of  experimental  data  points.  Autocorrelation  estimates  were  computed  from  N  =  9  samples.  Notice 
the  “hole  ”  at  (0,0),  where  the  exact  PDF  cannot  be  computed. 


logp(Ki  ■  ■  ■  KP)  -  logp(?'i  •  •  •  fp)  +  log  Ji. 

3)  Log-Transformed  Reflection  Coefficients:  Because  the 
reflection  coefficients  are  subject  to  the  constraint  K,  <  1, 
the  joint  PDF  of  p(Ki  ■  ■  ■  Kp)  is  discontinuous  and  difficult 
to  approximate  using  standard  techniques  such  as  Gaussian 
mixtures  [15].  A  better-behaved  set  of  features  is  obtained  by 
the  one-to-one  mapping 


Kk  =  log  (  J 


The  log  of  the  Jacobian  of  this  transformation  is 


logJ2  =  -^>g[r— ^ 


log p(K[  ■  ■  ■  K'p)  -  logp(ATi  •  •  •  Kp)  +  log  J2. 


Let  z  =  [zi  ■  ■  ■  zm]'  t*e  an  A/ -dimensional  real  random 
vector  with  joint  MGF  <7,  (A)  =  £{exp(A/z)},  where  vector 
A  =  [Ai  •  •  •  AmL  Then,  the  joint  PDF  of  z  at  the  Al  -dimen¬ 
sional  point  u  =  [rti  •  •  •  um\  is  given  by  the  M th  order  con¬ 
tour  integral 

V~  =  {i2n)M  Jc  exP(~X>u)9~  (A) dX  (9) 

where  i  =  i/— 1,  and  the  Al -dimensional  contour  C  is  par¬ 
allel  to  the  imaginary  axis  in  each  of  the  M  dimensions  of  A. 
The  joint  cumulant  generating  function  (CGF)  of  z  is  c%( A)  = 
log  <j_  (A) .  The  most  useful  Al  -dimensional  saddlepoint  (SP)  of 
the  integrand  of  (9)  is  that  real  point  A  =  A(u)  in  Al  -dimen¬ 
sional  space  where  all  M  partial  derivatives  satisfy 

for  1  <  rn  <  Al.  (10) 

When  the  contour  C  is  moved  in  M  dimensions  to  go  through 
the  real  SP  A  and  the  change  of  variable 

A  =  A  +  it,  t  —  [fi  ■  ■  ■  (11) 


III.  Saddlepoint  Approximation  (Theory) 

The  class  of  statistics  for  which  the  exact  PDF  is  known  is 
relatively  limited.  For  a  broader  class  of  statistics,  the  exact  mo¬ 
ment  generating  function  (MGF)  is  often  known.  The  problem  is 
that  inversion  of  the  MGF  transformation  to  find  the  exact  PDF 
may  not  be  possible  in  closed  form.  For  these  cases,  we  can  use 
an  approximation  that  provides  accurate  tail  PDF  estimates. 


is  made,  (9)  becomes 

p,  (u)  =  (2tt)~m  exp(— A  u)  /  exp(— «t/u)s,i(A  +  it)  dt 

(12) 

where  the  new  Al  -dimensional  contour  passes  through  the  SP 
at  t  =  0. 
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The  logarithm  of  the  integrand  of  (12)  can  be  expanded  in  a 
power  series  about  the  origin  t  =  0  according  to 


log{exp(— it/u)^  (A  +  it)} 

=  —  it'll  +  c,  (A  +  it) 

~  — it'u  +  c*( A)  +it/Vci(A)  -  (13) 


where  the  M  X  M  matrix 


cm  = 


d2c-(  A) 
d\md\mi  j 


(14) 


is  symmetric  in  m  and  mi  for  all  A.  Thus,  using  (10)  and  (13), 
the  integrand  of  (12)  can  be  approximated  as 


exp(— it'u)^,  (A  +  it)  ~  exp  c,  (A)  —  \  t'Cz  (A)t 


05) 


for  small  |t|.  If  this  approximation  is  now  extrapolated  to  all  t 
and  substituted  in  (12),  there  follows  the  usual  saddlepoint  (or 
tilted  Edgeworth)  approximation  (SPA)  in  M  dimensions  [11] 


P~(u) 


exp 


Ci( A)  -  Au 


(27T) 


M 


exp 


exp 


c-(A)  -  Au 


(2tt)"/2  det(Cf(A)) 


1  1/2’ 


~t'cr(A)t 
A  =  A(u). 


oft 

(16) 


point.  Experience  has  shown  that  even  in  the  tails,  the  errors 
tends  to  be  in  the  “mantissa”  rather  than  in  the  “exponent.” 

There  is  no  reason  why  additional  terms  in  the  expansion 
cannot  be  obtained.  In  fact,  additional  terms  of  the  expansion 
have  been  derived  for  some  important  cases  including  linear 
functions  of  independent  RVs  and  are  available  in  an  NUWC 
technical  report  [16].  These  terms  can  be  used  not  only  to  pro¬ 
vide  additional  accuracy  but  as  an  indication  of  the  validity  of 
the  first-order  SPA  as  well.  Issues  of  SPA  accuracy  are  treated 
in  more  detail  in  the  technical  report. 

C.  Linear  Functions  of  Independent  RVs 

In  many  signal  processing  applications,  linear  transforma¬ 
tions  are  made  on  a  set  of  non-Gaussian  but  independent  RVs. 
Examples  include  Fourier  analysis  of  the  squared  magnitudes  of 
a  set  of  real  or  complex  time  samples,  autocorrelation  estimates 
(by  the  FFT  method),  and  cepstrum  estimates.  These  problems 
can  be  posed  in  the  form  z  =  Ay,  where  yn  —  Trl  (./■„ ),  and 
A  is  an  M  X  N  matrix.  Note  that  RVs  {yn}  are  independent 
but  not  necessarily  identically  distributed.  The  output  vector  z 
is  of  length  M,  where  M  <  N\  thus,  the  transformation  is  not 
one-to-one.  Let  the  MGFs  and  CGFs  of  RVs  {y„}  be  denoted 
{(7„(u)}  and  {c„(u)},  respectively.  That  is 

9n(v)  =E{exp(uy„)},  cn{v)  -  log gn{v) 

for  1  <  n  <  N  (18) 


A.  Finding  the  Saddlepoint  A(u) 

To  obtain  the  SPA  (16),  the  real  SP  A(u)  satisfying  (10)  must 
be  found.  The  SP  may  be  found  using  the  Newton-Raphson 
iteration 


where  E{  }  denotes  an  ensemble  average.  The  weighted  sum  of 
independent  RVs  of  interest  is  given  by 

N 

zm  —  £  amnyn  for  1  <  m  <  M  (19) 

n= 1 


A)!+i  —  A„  +  kCz  1(A„)(u  —  c(  (A,,))  (17) 

where  0  <  />'  <  1  is  a  step-size  parameter.  The  zero  vector 
A  =  [0,  0  •  •  •  0]/  can  always  serve  as  a  starting  point  because 
it  is  always  in  the  ROC.  At  each  iteration,  the  new  value  of  A„ 
must  be  tested  to  see  if  it  is  still  in  the  ROC  and  modified  if 
necessary. 

If  the  search  for  the  SP  is  confined  to  the  real  axes  in  multi¬ 
dimensional  (MD)  lambda  space,  it  can  be  shown  that  the  Hes¬ 
sian  matrix  of  the  joint  cumulative  generating  function  (CGF) 
is  positive  definite  for  all  real  lambda  inside  the  MD  region  of 
definition  of  the  joint  MGF.  This  means  that  the  MD  integrand 
has  a  bowl-like  behavior  with  a  single  minimum  in  this  real  MD 
space.  Thus,  the  Newton-Raphson  search  procedure  will  always 
find  the  single  minimum  if  conducted  in  an  “appropriately  slow” 
fashion  and  if  the  search  is  constantly  confined  to  the  region  of 
definition. 


where  M  <  N,  and  the  M  X  N  real  matrix  [«„„,]  is  arbitrary, 
except  that  it  must  have  rank  M. 

The  joint  MGF  of  RVs  {zm}  is,  upon  use  of  (18)  and  (19)  and 
the  independence  of  RVs  {yn} 


<7*(Ai  •  •  •  Am)  =  E  | exp  Am^?)^ 


(  /  M  N  N 

=  E  <  6xp  I  ^  ^  Am  ^  ^  ^mnUn 


\m= 1  n= 1 

/  M 


m=  1 


=  JJ  E  |  exp  ^  amnK 

N 

-  IX  9n(b„{\)) 


n= 1 

where  A  =  [Ai  •  •  •  Ajif]',  and 


(20) 


B.  Accuracy  of  the  SPA 

The  accuracy  of  PDF  approximations  can  be  experimentally 
determined  in  the  tails  if  an  exact  formula  is  available  for  some 
special  case.  We  use  this  approach  whenever  possible  in  what 
follows. 

Because  the  SPA  is  based  on  a  series  expansion  of  the  MGF 
at  the  SP,  its  accuracy  depends  on  the  shape  of  the  MGF  at  this 


M 

b„{ A)  =  ^2  amnAm  for  1  <  n  <  N.  (21) 

m=  1 

That  is 

N 

9zW  =  9n(pn( A)).  (22) 

n= 1 
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The  joint  CGF  of  RVs  {zm}  is,  using  (18) 

N 

c-X  A)  =  log  S'*  (A)  =  53  c„(6„(  A)).  (23) 

n=l 

There  follows,  by  reference  to  (21) 

8c  (Al  N 

— i —  =  V'  amndn(bn (A))  for  1  <  m  <  M.  (24) 
m  )!  =  1 

Finally,  the  M  simultaneous  equations  that  must  be  solved  for 
the  M-dimensional  saddlepoint  A  are 

N 

53  amndn{bnW)  -  Zm  for  1  <  m  <  M  (25) 

n=l 

where  z  —  [zi  ••  •  zm\  are  the  particular  values  at  which  the 
joint  PDF  of  RVs  {  )  in  (19)  is  to  be  evaluated.  The  second- 

order  partial  derivatives  required  to  obtain  the  SPA  follow  from 
(21)  and  (24)  as 

o2  /  \  \  ^ 

—  =  53  amnam+nc"{bnW)  for  1  <  m,  m*  <  M. 

OAmOAm* 

n=  1 

(26) 

Once  saddlepoint  A  is  found  from  (25),  it  can  be  substituted  in 
(26),  and  the  M  X  M  Hessian  matrix  can  be  evaluated  at  the 
saddlepoint,  namely,  Cf  (A). 

IV.  Applications  of  the  Saddlepoint  Approximation 

A.  Linear  Sums  of  Magnitude-Squared  Gaussian  Random 
Variables 

An  important  set  of  statistics  are  weighted  sums  of  Chi- 
squared  RVs.  Specifically,  statistics  of  the  form 

N 

Zm  —  £  amnyn,  for  1  <  m  <  M  (27) 

n= 1 

where  yn  —  |ay,  |2  and  where  xn  is  a  real  or  complex  Gaussian 
RV,  are  frequently  encountered  in  signal  processing.  Included 
are  least-squares  polynomial  approximations  of  magni¬ 
tude-squared  time  series,  Fourier  analysis  of  squared  time 
series  or  FFT  output  bins,  and  two-dimensional  (2-D)  Fourier 
analysis  of  images  or  spectrograms.  Such  statistics  are  widely 
used  in  feature -based  classification  problems.  We  consider  both 
the  case  when  RVs  {j:„  )  are  real  and  when  {j:„  )  are  complex. 

1 )  Complex  RVs:  Assume  that  {x„  }  are  complex  zero-mean 
Gaussian  CN(0,1)  RVs  and  yn  =  |atn.|2.  Then,  yn  are  expo¬ 
nentially  distributed  py(y)  —  e~y .  Alternatively,  we  may  say 
that  u  —  2y  has  the  Chi-squared  distribution  with  2  degrees  of 
freedom,  which  are  denoted  p ,,  (u)  —  x  2  (u-  2).  Thus,  py(y )  = 
2pu(2y). 

Since  MGF  gn  (A)  is  not  a  function  of  n,  we  write  gn  (A)  = 
gy( A)  =  1/(1  — A),  where  A  <  1.  We  have  cy(A)  =  logfl'y(A)  = 
—  log(l  —  A)  for  A  <  1.  Substituting  this  into  (23) 

N  /  M  \ 

C-(A)  =  log^(A)  =  -  53  l0S  f  1  “  5^  A mamn  j  (28) 

n= 1  \  m= 1  / 


where  the  ROC  is  defined  by 

M 

/  (  A„,.am„.  V  1,  for  n  —  1  •  •  •  A . 

m=  1 


Taking  derivatives 


dcX  A) 
d\m 


N 


£ 


^ mn 


for  m  =  1  •  •  •  M. 


-1 


The  second  derivatives  are 


d2cX  A) 

8  Xm8  Xp 


N 


M 


-2 


£  i-£  A;ain  I  (  f,;pp  (hn  p  ) 

n=l  \  l-l  ) 

for  1  <  m,  p  <  M 


(29) 


from  which  we  can  construct  M  X  M  Hessian  matrix  CA  (A). 
Finally,  we  use  C-  (A),  c*( A)  in  (16). 

2)  Real  RVs:  When  {.r„  j  are  real  zero-mean  Gaussian 
N( 0,1)  RVs,  the  distribution  of  yn  —  |a’„|2  is  y2  with  1 
degree  of  freedom;  thus,  py(y)  —  X2(lL  !)■  Thus,  gy(X)  — 
1/ i/l  —  2A,  where  A  <  0.5.  Similar  to  the  complex  case, 
we  have  gn( A)  =  §y(X)  —  1/y/l  —  2A,  where  A  <  0.5.  We 
have  Cy  (A)  =  log^y(A)  =  — (l/2)Iog(l  -  2A)  for  A  <  0.5. 
Substituting  this  into  (23) 

N  /  M 

c-(X)  -  loggX A)  =  -  53  lc,g  (  1  “  2  5^ 

n= 1  \  m= 1 

where  the  ROC  is  defined  by 

M 

/  (  Xm(imn  O.o,  for  n  —  1  •  •  •  IV . 

m= 1 


Taking  derivatives 


dc-X  A) 

dXm 


M 


—  2  53  amn  F-2£  A^ftln 


n= 1 


l-l 


for  1  <  m  <  M. 


The  second  derivatives  are 

d2cX  A) 


8  Xm8  Xp 


M 

2  53  X{(1  In 


-2 


1  2  y  A^tiin  |  (4.apnamn) 

for  1  <  m,  p  <  M 


from  which  we  can  construct  Hessian  matrix  6/ (  A). 

3)  Experimental  Validation:  To  validate  the  above  results, 
especially  in  the  tails,  it  is  necessary  to  find  a  case  for  which 
the  exact  result  is  known.  Notice  that  (27)  may  be  written  in  the 
matrix  form 


z  =  Ay 

where  A  is  a  general  full-rank  M  X  N  matrix.  A  class  of  ma¬ 
trices  A  for  which  the  exact  joint  PDF  of  z  can  be  calculated  is 
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described  by  Nuttall  [17].  A  special  case  of  this  more  general 
class  is  the  following.  Let 


u  =  Aiy 


where 


Ai 


2  0  2  0 
0  2  0  2 


2  0 
0  2 


Clearly  then,  u\  and  u_>  are  independent.  Furthermore,  for  the 
case  of  Section  IV-A1,  ui  and  n_>  are  chi-square  RVs  with  N 
degrees  of  freedom.  Thus,  it  is  straightforward  to  write  down 
the  PDF  p,,  (u).  The  above  can  be  generalized  if  we  linearly 
transform  u  using  a  general  full-rank  two-by-two  matrix  A2. 
Let 


z  =  A2Aiy  =  A2U. 
Then,  the  joint  PDF  of  z  will  equal 


P»( z)  =  Pu( A2  xz)|  det(A2)|  1. 
We  tried  this  approach  with  N  =  128  and 


and  { y,  \  derived  from  complex-valued  data,  as  in 
Section  IV-A1. 

The  experiment  was  designed  to  probe  the  tails  of  the  PDF. 
This  was  accomplished  by  generating  data  using  a  variance  dif¬ 
fering  from  one  (the  assumed  PDF).  In  the  experiment,  1000 
data  samples  of  z  were  generated  using  Xj  ~  CN(0,  a2),  where 
a2  was  chosen  randomly  according  to  a2  —  w2  and  where 
w  ~  !V(0,  100).  The  SPA  error  was  determined  by  comparing 
with  the  exact  expression.  The  results  are  shown  in  Fig.  2.  The 
error  was  approximately  +0.0026  for  all  samples  except  one, 
for  which  the  error  was  —0.0026.  It  is  unclear  why  this  pat¬ 
tern  occurred.  Notice,  however,  that  this  very  low  approxima¬ 
tion  error  occurred  in  the  deep  tails  where  log++(z)  is  as  low  as 
— 140  000.  The  accuracy  that  is  obtained  by  the  SPA  method  de¬ 
pends  ultimately  on  the  accuracy  of  the  integral  approximation 
(15)  and  the  accuracy  to  which  the  saddlepoint  itself  is  deter¬ 
mined.  These,  in  turn,  depend  on  the  matrix  A  as  well  as  the 
univariate  MGFs  gy  (y) .  Issues  of  the  approximation  accuracy 
have  been  studied  by  Nuttall  [16].  In  particular,  additional  terms 
of  the  power  series  expansion  (13)  can  be  obtained  for  the  case 
of  the  linear  function  of  iid  RVs. 


B.  Cepstrum  and  Autocorrelation  Estimates  ( Noncontiguous ) 

It  is  possible  to  represent  the  cepstrum  and  autocorrelation  es¬ 
timates  as  linear  functions  of  a  set  of  independent  (but  not  identi¬ 
cally  distributed)  RVs,  allowing  the  results  of  Section  III-C  to  be 
used.  Recall  that  samples  { xt },  0  <  f  <  N  —  1  are  independent 
identically  distributed  (iid)  real  Gaussian  RVs  with  zero  mean 
and  unit  variance.  The  corresponding  complex  Fourier  coeffi¬ 
cients  are  defined  as 


Xk 


2  X  1/2JV-1 

—  )  ^2  Xt  exp(— z27t kt/ N) 

'  t= 0 

for  0  <  k  <  N  —  1. 


Fig.  2.  PDF  estimation  error  for  SPA  in  the  far  tails  using  data  generated  with 
random  variance.  The  vertical  axis  shows  the  difference  between  the  log-PDF 
values  of  the  SPA  and  the  exact  expression.  The  error  was  +2.6e-3  most  of  the 
time,  except  for  one  occurrence  of  —  2.6e-3. 


Since  {./■, }  are  real,  it  follows  that  Xy-k  =  X/  for  1  <  k  < 
iV  —  1;  however,  Xo  and  Ay/ 2  are  always  real. 

Define  the  set  of  N  real  spectral  quantities 

Case  I  (Autocorrelation):  Yk  =  X/.  |2 

Case  II  (Cepstrum):  Yk  =  log  |  A' / .  | 2 

for  0  <  k  <  N  —  1  and  N  even.  Then,  Yy-k  —  1+  for  1  < 
k  <  N  —  1.  Finally,  perform  an  inverse  FFT  back  into  the 
time  domain  to  get  the  ACF  or  cepstrum  estimates,  respectively, 
according  to 

Ar— 1 

wt  =  Ifc  exp(z27t kt/N),  forfl  <  t  <  N  —  1.  (32) 
k=  0 

For  Case  I,  wt  are  autocorrelation  estimates  (scaled  by  2 N), 
and  for  Case  II,  they  are  cepstrum  estimates  (also  with  a  special 
scaling). 

Expression  (32)  can  be  written  purely  in  terms  of  real  vari¬ 
ables 


N/2 

wt  —  ^2  'A  cos(2tt ht/N)Yk,  for  0  <  t  <  N/2  (33) 

k- 0 

where 

A  f  1,  for  k  =  0  and  N/2  I 
e.h  —  s  r .  (34) 

l  2,  for  1  <  k  <  N/2  -  1  J 

The  remainder  of  RVs  {wt}  can  be  found,  if  desired,  from  the 
relation  wpr_t  —  wt  for  1  <  t  <  N  —  1. 

Complex  RVs  {A;.}  in  (30),  for  0  <  k  <  N/2,  are  all  in¬ 
dependent  of  each  other,  due  to  the  white  Gaussian  statistics  of 
{xt}  that  were  assumed.  We  let 


(30) 


Xk  =  Uu  +  jVk  for  0  <k<  N/2. 


(35) 
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Then,  Vo  =  0,  V/yy2  =  0,  and 

E(U?)  =  2,  E(C/|r/2)  =  2,  E(t/2)  =  E(lf)  =  1  (36) 

for  1  <  k  <  N/2  -  1.  All  these  {(',,)  and  {V;,|  RVs  in  (35) 
are  independent  of  each  other  and  are  Gaussian  zero  mean.  We 
express  (33)  in  the  compact  form 

N/2 

wt  —  22  atkYk,  for  0  <  t  <  N/2  (37) 

k-0 

where 

atk  =  ej;  cos(27t kt/N),  for  0  <t,k<  N/2.  (38) 

In  (37),  all  of  the  { Yk  }  RVs  for  0  <  k  <  N/2  are  independent 
of  each  other  because  all  of  the  RVs  { A'/. }  for  0  <  k  <  N/2 
are  independent  of  each  other. 

1 )  Case  I — Autocorrelation  Estimates:  From  the  informa¬ 
tion  above,  it  is  seen  that  RVs  (7q  and  £/^y,,  if  scaled  by  1/2, 
have  a  chi-squared  distribution  with  1  degree  of  freedom.  The 
X2(l)  PDFisp(rt)  =  (1/2v^7t)(m/2)-1/2  exp(— u/4.)  for -it  > 
0,  with  MGF  g(v)  —  1  / \J  1  —  2v  for  v  <  0.5.  It  follows  that 
without  the  scaling,  the  MGF  is 


9o(v)  -  g(2v) 


i/l  —  4v 


for  v  <  0.25. 


for  v  <  0.5. 


1  -  2v' 

In  compact  notation,  define  the  MGFs  of  all  the  { Yk }  in  (37) 
A  f  9o(v),  for  k  —  0  and  N/2 


gt  (v) 

.  9i(v),  f°r  1  <  k  <  N/2  —  1 
Then,  the  corresponding  CGFs  for  the  { 1). }  in  (37)  are 

c*k.{v)  =  log  g\.{v) 

co(v)  —  log go(v) ,  for  k  —  0  and  iV/2 
ci(v)  =  loggi/v),  for  1  <  k  <  N/2  -  1 


Write  this  expression  as 

N/2 

zm  -  ^2  bmkYk  for  1  <m<M  (45) 


k= 0 


where 


Zm  —  ,  bmf;  —  l: 

for  1  <  to  <  M,  0  <  k  <  N/2.  (46) 

Let  vector  A  =  [Ai  •  •  •  Aj\f]/  and  random  vector  z  = 
[z i  •  •  •  z_\i\ .  The  joint  MGF  of  the  M-dimensional  vector  z  is 

9s.  (A)  =  E  {exp  (A'z)} 

=  E  |exp  Xmz„^j  | 

{/  m  n/2  ^ 

exp  ^  Yn  ^  bmkYk 

\m=  1  k= 0 

=  U  E  |  exp  ( Yk  22  bmkX„^\  | 
k= 0  l  V  m= 1  )  ) 

N/2 

=  n  9l(dk{\)) 


(47) 


k= 0 


(39) 


In  addition,  RVs  (U/  +  V/:),  for  1  <  k  <  N/2  —  1,  have  PDF 
exp(— u/2)/2  for  u  >  0,  with  MGF 


(40) 


where  we  used  the  independence  of  RVs  { X/  }  [see  (39)— (41)] 
and  defined 

M 

dk{X)  =  22  bmkXm,  for  0  <  k  <  N/2.  (48) 

m= 1 

The  joint  CGF  of  z  follows,  from  (42)  and  (47),  as 


N/2 


Cz( A)  -  logs'- (A)  -  22  ck(Mx)). 


(49) 


k- 0 


(41) 


The  partial  derivatives  of  interest  are 


d 
d\ n 


N/2 


Cz( A)  =  22  bmk4'(dkW),  for  1  <  m  <  M 


k= 0 


where  we  used  (48)  to  determine 
d 


(50) 


dXr 


<4 (A)  =  bmk,  for  1  <  m  <  M,  0  <  k  <  N/2. 


(42) 


Now,  we  will  consider  a  subset  of  all  the  N  {wt}  RVs  origi¬ 
nally  defined  in  (32)  and  then  manipulated  into  forms  (33)  and 
(37).  In  particular,  consider  only  the  set 

N/2 

wt~22  atkYk,  for  t  —  h  ■■■  tM,  M  <  N/2  +  1  (43) 
k= o 

where  {tm}  are  M  arbitrary  distinct  time  instants  in  the  interval 
[0,  N/2].  That  is 

N/2 

Wtm  -  22  aUnkYk  for  1  <  TO  <  M.  (44) 

k- 0 


(51) 


The  M  simultaneous  equations  that  must  be  solved  for  saddle- 
point  A  are,  from  (46)  and  (50) 

N/2 

22<hmkct'{dk{x))  —  wtm,  for  l<m<M,  M<N/2+l 

A— ii 

(52) 

where  are  the  particular  M  values  of  RVs  {wt}  and 

where  the  joint  PDF  of  the  ACF  estimates  is  of  interest.  We  also 
have,  from  (50)  and  (51) 

d2  N 

c-X A)  =  22  bm.1kbm.2kCk" (dk(X)) 


@Xm±  dXm.2 


k- 0 


for  1  <  toi,  to 2  <  M.  (53) 
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The  MGFs  for  {Yk}  were  presented  in  (39)  and  (40).  The  func¬ 
tions  required  in  the  SPA  are 

g0(v)  =  \/\J\-  Av 
co(v )  =  — 0.51og(l  -  Av) 

Co(v)  = 

cg(0=- 


and 


Av 

8 


(1  -  Av)2 

gi{v)  =1/(1  -  2v) 
ci(v)  —  log(l  -  2v) 

<<l(v)=-  2 


1  -  2v 
A 

(1  -  2v)2' 


2)  Case  II:  Cepstrum  Estimates:  From  the  information 
above,  it  is  seen  that  RVs  U2  and  Vyp>  have  PDF  exp(— u/A)/ 
( Ann )1/2  for  it  >  0.  In  additon,  RVs  (U2  +  V/2),  for  1  < 
k  <  N/2  —  1,  have  PDF  exp(— u/2)/2  for  u  >  0.  It  follows 
that  RVs  To  and  T(yy2  have  PDF  exp(u/2  —  eu  /  A)  /  (Att)1/2 
for  all  u,  whereas  RVs  T/ ,  for  1  <  k  <  N/2  —  1,  have  PDF 
exp(«  —  e“/2)/2  for  all  u.  This  immediately  leads  to  the 
common  MGF  for  To  and  T'V/2  in  the  form 

g0(v)  =  AvT(v  +  1/2)/ \/n  for -1/2  <v  (56) 

whereas  the  common  MGF  for  T/ ,  1  <  /,:  <  N/2  —  1  is 

gi(v)  =  2vT(v  +  1)  for  —  1  <  v.  (57) 

The  functions  required  for  the  SPA  for  the  cepstrum  estimates 
are 

go(v)  =4T(v  +  1/2)/ \ptt 

c0(v)  —  v  log(4)  +  logT(i>  +  1/2)  -  (l/2)log(7t) 

c'0(v)  =  log(4)  +  f(v  +  1/2) 

co(v)  =4’\v  +  1/2) 

c'0»=V<>  +  1/2) 

cg>)=V’'>  + 1/2)  (58) 

and 

gi(v)  —  2vY(v  +  1) 
d(v)  —  v  log(2)  +  logT(u  +  1) 
di(v)  -  log(2)  +  t[’(v  +  1) 
d{(v)  =i//(v  +  1) 

£'»=Y>"(v  +  1) 

cT(v)  -i’'"(v  +  l)  (59) 

where  <$  is  the  psi  function  (see  [18,  Sec.  6.3  and  6.4], 

3 )  Experimental  Validation:  Exact  solutions  to  validate  the 
unnormalized  ACF  estimates  (Section  IV-B1)  and  Cepstrum  es¬ 
timates  (Section  IV-B-2)  have  not  yet  been  worked  out;  there¬ 
fore,  approximation  error  cannot  yet  be  determined.  Note,  how¬ 
ever,  that  the  basic  approach  is  the  same  as  Section  IV-A1, 


Fig.  3.  Comparison  of  the  method  of  Section  IV-B2  with  a  Gaussian  mixture 
approximation  with  standard  normal  input  data. 

which  has  been  validated  experimentally.  The  approaches  differ 
in  the  choice  of  matrix  A  and  the  fact  that  RVs  {y„}  are  not  all 
identically  distributed. 

In  spite  of  that,  it  is  still  a  good  idea  to  validate  the  analysis 
with  another  entirely  different  approach.  To  do  this,  we  used 
a  Gaussian  mixture  (GM)  PDF  approximation  using  simulated 
data.  While  a  PDF  approximation  obtained  from  simulated  data 
cannot  test  errors  in  the  tails,  it  can  at  least  validate  the  PDF  in 
the  neighborhood  of  the  peak.  With  this  in  mind,  the  method  of 
Section  IV-B2  was  tested  using  a  noncontiguous  set  of  cepstrum 
coefficients.  We  took  4000  independent  samples  of  a  set  of  eight 
cepstrum  outputs  from  a  size- 1024  cepstrum.  The  cepstrum  in¬ 
dexes  were  {£,- }  =  {2,  5,  8,  11,  12,  13,  14,  15}.  The  cepstrum 
processor  was  excited  with  RVs  {a;,  }  from  the  standard  normal 
distribution.  The  4000  eight-tuples  were  used  as  training  data 
for  a  GM  PDF  estimator  [15].  The  same  data  was  used  to  pro¬ 
duce  Fig.  3.  In  this  plot,  we  evaluated  the  log-PDF  for  each  data 
sample.  The  log-PDF  from  the  GM  approximation  is  plotted  on 
the  x  axis,  and  the  value  from  the  method  of  Section  IV-B2  is 
plotted  on  the  y  axis.  Ideally,  all  data  points  should  fall  on  the 
x  —  y  line.  The  plot  shows  good  agreement,  considering  the 
fact  that  the  PDF  approximation  is  in  a  relatively  high  dimen¬ 
sion.  The  errors  increase  in  the  tails;  however,  this  is  due  to  the 
mixture  approximation  and  not  the  SPA. 

C.  Other  Applications 

The  SPA  is  applicable  whenever  the  MGF  or  CGF  can  be 
derived.  The  SPA  has  been  derived  for  additional  statistics  in¬ 
cluding  correlated  and  non-Gaussian  statistics  [16],  [19], 

V.  Other  Asymptotic  Methods 

A.  Reflection  Coefficients  and  Autocorrelation  Estimates 
(Contiguous) 

For  large  TV,  an  asymptotic  form  is  available  for  the  reflection 
coefficients,  which  is  due  to  Daniels  [2],  Let  [iv/  •  •  •  Kp\  be 
the  reflection  coefficients  derived  from  the  Levinson  recursion 
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on  [?'i  •  •  •  [13].  The  general  form  for  the  asymptotic  (large 

N)  distribution  of  \K±  ■  ■  ■  Kp]'  is  the  following.  Let 


c0  =  logT 


log7T 


-  logT 


TV-  1 
2 


logT  (TV  +  1)  -  N  log  2  -  logT  ^ 

-  logT 


TV  +  3 


Let  ne  be  the  largest  integer  less  than  or  equal  to  P/2,  and  let 
n0  be  the  smallest  integer  greater  than  or  equal  to  P/2.  Then 


logp(Ki  •  •  •  KP) 

ne 

=  E 


i= 1 


N  —  3 

2  log(l  +  K2i)  +  — —  log(l  -  Kl) 


^2  \  c°  +  +  K2i-l)  +  log(l  -  P|_i) 


i= 1 


(60) 


Fig.  4.  Comparison  of  theoretical  and  approximate  PDF  for  normalized  ACF 
estimates. 


As  explained  in  Section  II-B2,  the  normalized  autocorrela¬ 
tion  function  estimates  are  related  to  the  reflection  coefficients 
by  a  one-to-one  transformation  [13].  The  PDF p(?\  •  •  •  fp)  re¬ 
quires  knowing  the  Jacobian  of  the  transformation  from  ACF  to 
reflection  coefficients,  namely 

p- 1 

l0gJ=-^(P-2)l0g(l-P?). 

1=1 

Thus 

logp(fi  •  •  •  fp)  -  log  J  +  logp(Ki,  K2  ■■■  Kp).  (61) 
B.  Experimental  Validation 

We  can  compare  the  approximation  for  p(f i,  f2  •••  fp) 
from  Section  V-A  with  the  exact  expression  from  Section  II-B. 
In  order  to  evaluate  tail  accuracy,  independent  samples  were 
passed  through  an  AR  filter  of  order  4  to  make  them  correlated. 
The  AR  filter  coefficients  were  chosen  at  random  by  selecting 
reflection  coefficients  from  a  uniform  distribution  in  the  range 
[—1,1]  and  then  transforming  to  AR  coefficients  and  ACF 
samples.  Two-hundred  independent  trials  were  computed.  The 
log-PDF  value  from  the  exact  method  of  Section  II-B  was 
plotted  on  the  X  axis  of  Fig.  4,  and  the  difference  between  the 
log-PDF  from  (61)  and  the  exact  expression  is  plotted  on  the  Y 
axis.  The  error  was  quite  small  (less  than  1  in  magnitude)  for 
samples  with  log-PDF  values  above  —10  but  increased  in  the 
tails.  This  could  reflect  errors  due  to  an  inherent  assumption 
of  independence.  Note,  however,  that  the  errors  are  acceptable 
even  at  very  low  values  of  log-PDF. 

VI.  Application  to  Autoregressive  (AR)  Model-Order 
Selection 

Consider  the  problem  of  determining  the  order  P  of  an  au¬ 
toregressive  process,  which  is  denoted  ART).  The  MAP  rule 

argmaxp(Pp|x)  =  maxp(x\Hp)p(Hp)  (62) 


where  Hp  is  the  hypothesis  corresponding  to  order  P.  may  be 
implemented  using  the  class-specific  approach.  Applying  (3), 
we  have 


arg  max 


p(zp\Hp) 

p(zP\H0) 


(63) 


where  we  have  assumed  that  p(Hp)  is  a  constant,  Ho  corre¬ 
sponds  to  the  case  of  iid  Gaussian  noise,  and  zp  is  a  sufficient 
statistic  for  the  ART)  process.  Of  course,  this  assumes  we  have 
the  prior  knowledge  of  p(z,p\Hp).  It  is  interesting  to  note,  how¬ 
ever,  that  the  denominator  term  in  (63)  usually  has  a  dominant 
effect  on  the  decision.  In  fact,  Kay  [6]  has  shown  that  omission 
of  the  numerator  in  (63)  and  using  the  rule 


arg  max 


pP(zp\H0) 


(64) 


implements  the  conditional  model  estimator  (CME),  which 
has  been  shown  to  outperform  the  minimum  description  length 
(MDL)  rule  [7],  However,  (63)  should  be  an  upper  bound  in 
performance  due  to  knowledge  of  the  numerator  PDF  (which 
we  call  the  a  priori  PDF  of  zp). 

We  now  test  formulas  (63)  and  (64)  and  compare  with  the 
Akaike  and  MDL  method  [13].  As  an  approximate  sufficient 
statistic  for  an  ART)  process,  we  use  the  vector  of  circular  ACF 
estimates  (7),  which  is  denoted 


iP  -  [h/fo,  f2/f o  •  •  •  fp/f0]'. 


In  the  experiment,  we  generate  an  AR  process  with  known  P  in 
the  range  1  <  P  <  4.  We  utilize  odd  data  record  lengths  ofiY  = 
11, 13,  15,  17,  21, 127,  and  255  with  1000  independent  trials  for 
each  combination  of  P.  N .  In  each  trial,  the  AR  process  is  de¬ 
termined  by  randomly  selecting  the  reflection  coefficients  (RCs) 
from  a  uniform  distribution  on  [— 1,1].  The  only  restriction  was 
that  the  Pth  RC  had  to  be  greater  than  0.2  in  magnitude  (to  make 
sure  the  data  model  was  truly  of  order  P).  The  RCs  were  then 
converted  into  AR  coefficients,  and  an  AR  process  was  created 
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by  filtering  independent  Gaussian  noise.  The  model  order  selec¬ 
tion  was  accomplished  by  determining  the  best  fit  of  the  given 
approach  over  1  <  P  <  5.  We  compared  the  following  ap¬ 
proaches: 

•  CS-RC:  Class-Specific  Using  RC:  Because  the  RC  esti¬ 
mates  are  related  to  rJ>  by  a  one-to-one  transformation, 
they  are  equivalent  from  a  sufficiency  point  of  view.  We 
may  approximate  the  PDF  of  the  RC  estimates  p(Kp\Hp) 
by  the  PDF  of  the  true  RCs  used  in  the  experiment.  This 
approach  should  provide  somewhat  of  an  upper  bound  of 
performance  since  it  makes  use  of  knowledge  not  avail¬ 
able  to  the  other  methods.  Let  K.J  =  \K\,  K>  •  •  •  Kp]' 
be  the  vector  of  RC  estimates.  Equation  (63)  becomes 


KKF|g/>) 

p(Kr\Ho) 


(65) 


with  (61)  used  to  approximate  the  denominator  PDF.  We 
used  the  prior  density  of  p{K.F\Hp)  —  2~p,  which  is 
the  density  used  for  the  true  RCs  in  the  experiment.  We 
ignored  the  effect  of  the  constraint  \Kp\  >  0.2  and  the 
fact  that  the  RC  estimates  K  do  not  in  fact  have  the  same 
density  as  the  true  RCs. 

•  CME-ACF — CME  Approach  Using  ACF:  Implementing 
CME  using  rF,  (63)  becomes 


TABLE  I 

Results  for  P  =  1 .  Probability  of  Correct  Model  Order  in  Percent 


N 

CS-RC 

CME-ACF 

Akaike 

MDL 

11 

85.3 

65.6 

60.5 

68.5 

13 

87.4 

69.5 

62.1 

72.8 

15 

87.0 

71.2 

66.6 

78.9 

17 

88.4 

74.5 

68.4 

81.7 

21 

90.2 

78.4 

70.0 

85.8 

127 

96.4 

94.2 

74.4 

96.7 

255 

97.0 

96.5 

75.1 

97.6 

TABLE  II 

Results  for  P  =  2.  Probability  of  Correct  Model  Order  in  Percent 


N 

CS-RC 

CME-ACF 

Akaike 

MDL 

11 

28.2 

38.2 

27.8 

26.9 

13 

32.4 

40.8 

30.4 

30.3 

15 

40.4 

47.3 

35.6 

36.8 

17 

41.9 

48.1 

38.7 

38.0 

21 

53.7 

57.0 

48.3 

48.8 

127 

86.3 

86.5 

67.8 

86.0 

255 

93.8 

95.1 

70.8 

94.1 

max  — t - t-  (66) 

F  p(vp\H0) 

with  (61)  used  to  approximate  the  denominator  PDF. 

•  Akaike  Method  ( Circular  ACF):  The  Akaike  method  is 

min  {TV  logaf>  +  2 P}  (67) 

where  dj,  is  the  estimate  of  the  power  of  the  white  noise 
driving  sequence  computed  from  the  ACF  estimates  r 
using  the  Levinson  algorithm  [13].  The  circular  ACF 
estimates  were  used. 

•  Minimum  Description  Length  (MDL):  The  MDL  method 
is 


nun  {TV  log  dj>  +  P  log  IV}  .  (68) 

Results  of  the  experiment  are  provided  in  Tables  I-IV  for  tme 
value  of  P  ranging  from  1  to  4.  It  is  difficult  to  compare  the  per¬ 
formance  of  the  various  approaches  based  on  just  one  value  of 
true  P.  This  is  due  to  biases  that  cause  a  given  approach  to  per¬ 
form  better  at  a  particular  P  and  worse  at  another.  For  example, 
the  Akaike  method  is  biased  toward  a  higher  value  of  P  and  thus 
appears  better  at  P  —  4  for  low  values  of  N .  Average  perfor¬ 
mance  (averaged  over  P)  is  plotted  in  Fig.  5.  The  results  show 
that  at  low  N,  the  CS-RC  and  CME-ACF  methods  outperformed 
both  the  MDL  and  Akaike  methods  consistently  by  about  3%. 
At  high  TV,  all  methods  were  similar,  except  the  Akaike  method, 
which  is  known  to  be  an  inconsistent  estimator  of  model  order. 
The  advantage  of  the  known  prior  in  the  CS-RC  approach  was 


TABLE  III 

Results  for  P  =  3.  Probability  of  Correct  Model  Order  in  Percent 


N 

CS-RC 

CME-ACF 

Akaike 

MDL 

11 

9.2 

17.7 

11.5 

10.0 

13 

15.0 

20.3 

16.3 

14.7 

15 

21.0 

27.9 

23.4 

21.0 

17 

24.7 

29.9 

27.5 

22.3 

21 

33.3 

36.9 

33.6 

29.4 

127 

82.9 

85.9 

64.7 

83.2 

255 

86.6 

89.8 

65.2 

87.3 

TABLE  IV 

Results  for  P  =  4.  Probability  of  Correct  Model  Order  in  Percent 


N 

CS-RC 

CME-ACF 

Akaike 

MDL 

11 

6.6 

8.5 

15.5 

12.2 

13 

7.9 

10.5 

17.4 

13.6 

15 

10.6 

13.3 

20.4 

14.6 

17 

16.2 

16.6 

23.6 

17.4 

21 

24.1 

25.9 

31.9 

25.6 

127 

72.7 

74.7 

63.8 

72.8 

255 

80.4 

84.1 

65.2 

81.0 

not  significant  compared  with  CME-ACF,  although  it  appears  to 
always  outperform  the  CME-ACF  method  by  a  small  amount. 
The  CS-RC  approach,  which  has  some  prior  knowledge  of  the 
distribution  of  the  RCs,  provided  better  performance  most  of 
the  time  than  the  other  approaches;  however,  it  appears  to  fall 
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Fig.  5.  Model  order  selection  results  averaged  over  P  =  1-4. 


below  CME  at  high  N .  This  can  be  explained  by  the  fact  that 
the  CS-RC  approach,  while  it  has  prior  knowledge,  is  only  an 
approximate  MAP  implementation. 

VII.  Conclusions 

In  this  paper,  we  have  provided  approximations  to  the  joint 
multidimensional  PDFs  of  some  important  statistics  in  signal 
processing,  including  autocorrelation  estimates,  Cepstrum  es¬ 
timates,  and  general  linear  functions  of  independent  RVs.  Al¬ 
though  the  approaches  we  have  used  can,  in  principle,  be  used 
for  any  input  data  hypothesis,  we  have  provided  examples  in 
which  the  input  data  is  assumed  to  be  iid  samples  of  Gaussian 
noise.  Because  these  approximations  are  valid  in  the  tails  of  the 
PDF,  they  can  be  used  in  conjunction  with  the  PDF  projection 
theorem  (3)  to  provide  PDF  estimates  of  the  input  raw  data  for 
real-world  statistical  hypotheses.  An  application  of  the  method 
has  provided  an  AR  model-order  selection  approach  that  out¬ 
performs  the  MDL  and  Akaike  methods.  The  model  selection 
approach  is  quite  general  and  can,  in  principle,  be  applied  to  any 
model-order  selection  problem,  provided  there  exists  a  well-de¬ 
fined  approximate  sufficient  statistic  or  approximate  sufficient 
statistic  for  each  model  order. 

Appendix 

A.  MATLAB  Implementation  of  Equation  (8) 

The  function  corrpdf(r,  N)  below  returns  the  PDF  value 
for  an  input  vector  r  —  [?'i  •  •  •  ('i  f ,  which  has  been  obtained 
from  N  input  data  samples.  The  program  requires  N  to  be  odd. 
The  algorithm  becomes  numerically  unstable  for  N  greater  than 
about  30,  and  it  is  slow.  A  faster  version,  using  arbitrary  preci¬ 
sion  arithmetic,  has  been  written  in  C,  which  extends  its  use¬ 
fulness  to  higher  N  (as  high  as  500  has  been  tried  success¬ 
fully);  however,  it  is  still  somewhat  impractical.  The  usefulness 
of  these  programs  is  in  the  fact  that  they  are  able  to  validate  the 
asymptotic  form  to  be  presented  in  Section  III. 


function  pdf  =  corrpdf  (r.  N) 

2-  2-  2-  2-  2-  2-  9-  2-  2-  2-  2-  f’nRRPFlIi1  M  2- 2*  2*  2*  2*  2*  2*  2- 2~  2~  2- 2*  2*  2- 2*  2*  2*  2*  2- 2~ 
ooooooooooo  GUi\i\r  \J  r  •  1V1  oooooooooooooooooooo 

%  function  pdf  =  corrpdf(r,  N)  % 

%  Code  developed  by  S .  M.  Kay,  1998  % 

%  Modified  by  P.  M.  Baggenstoss  % 

2-2'2-2-2-2-2'2'2'2-2'2-2-2'2-2-9'2'9'2'2-2'2-2-2-2-2-9'2'2'2'2-2'2-2'2-2-2-9-2'9'2' 

oooooooooooooooooooooooooooooooooooooooooo 

'•  '•(:); 

M  =length(r); 

if  round((iV  —  l)/2)  =  (N  —  l)/2, 
error  ("N  must  be  odd"); 

end 

n  —  (N  —  l)/2; 
lambda  =  zeros  (n,  M)  ; 
for  j  —  1:  n 

lambda(j,  :)  =  cos  (2  *  pi  *  j  *  [1  :  M\/N ) ; 
end 

sm  —  0; 
for  jl  =  1:  n 

sm=  corrloop  (jl,  r,  lambda,  sm)  ; 
end; 

pdf=sm*prod(n  —  [1:  M\); 

2'2'2'2'2-2'2'9'2'2'2-2'2'2'2'2-2'9'2'2'2'2-2'2'2'2'2-9'2'9'2'2'2-2'2'2-2-2-9-2'2'2' 

oooooooooooooooooooooooooooooooooooooooooo 

%  subroutine  corrloop  % 

2'2'2'2-2-2-2'2'2'2-2'2'2'2'2-2-2-9'9'2'2-2'2'2'2'2-2-9'2'9'2'2-2'2'2'2-2-2-9-2'9'2' 

oooooooooooooooooooooooooooooooooooooooooo 

function  term=  corrloop  (idx,  r,  lambda  ,  term)  ; 
[n,  M\  —  size  ( lambda)  ; 
to  = length ( idx) ; 
if  (to  >=  M), 

B  = lambda (idx,  1:  M ) ; 
alpha  =  B'\r; 

if  min  (alpha)  >  0  &  sum (alpha)  <  1, 

A  —  [1  r' ;  ones  (M,  1  )B\; 
da  =  det  (A) ; 

C=  1; 

for  j  —  1:  n 

if  sum  (j  ==idx)  ==  0, 

D  —  [1  1  ambda!,/.  1:  M);  ones (M,  1  )B\; 

C  —  C*  (det  (D)/ da)  ; 
end 

end 

term  =  term+det  (A) A(— l)*sign(det  (B))/C; 
end 

else, 

for  i  —  idx  (to)  +  1 :  n, 

term  =  corrloop([idx;  /].  r,  lambda,  term) ; 
end; 
end; 
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